A Strategy for Efficient Crawling of Rich Internet Applications

نویسندگان

Kamara Benjamin

Gregor von Bochmann

Mustafa Emre Dincturk

Guy-Vincent Jourdan

Iosif-Viorel Onut

چکیده

This thesis studies the problem of crawling rich internet applications. These applications are built using advanced web technologies which allow them to be more dynamic and enable better user experiences. In recent years, the popularity and importance of web applications has continually increased and they are now very commonly used to complete essential tasks such as financial transactions. As a result, the need to crawl these applications goes beyond the desire to index content for search. For example, applications also need to be analyzed in order to detect security vulnerabilities and assess accessibility. In this thesis, the challenges involved with crawling rich internet applications are discussed and an efficient strategy for crawling these applications is presented. We also use this strategy to develop a prototype tool for crawling AJAX-based applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Statistical Approach for Efficient Crawling of Rich Internet Applications

Modern web technologies, like AJAX result in more responsive and usable web applications, sometimes called Rich Internet Applications (RIAs). Traditional crawling techniques are not sufficient for crawling RIAs. We present a new strategy for crawling RIAs. This new strategy is designed based on the concept of “Model-Based Crawling” introduced in [3] and uses statistics accumulated during the cr...

متن کامل

A Statistical Approach for Efficient Crawling of Rich Internet Applications1

متن کامل

Building Rich Internet Applications Models: Example of a Better Strategy

Crawling “classical” web applications is a problem that has been addressed more than a decode ago. Efficient crawling of web applications that use advanced technologies such as AJAX (called Rich Internet Applications, RIAs) is still an open problem. Crawling is important not only for indexing content, but also for building models of the applications, which is necessary for automated testing, au...

متن کامل

GDist-RIA Crawler: A Greedy Distributed Crawler for Rich Internet Applications

Crawling web applications is important for indexing, accessibility and security assessment. Crawling traditional web applications is an old problem, for which good and efficient solution are known. Crawling Rich Internet Applications (RIA) quickly and efficiently, however, is an open problem. Technologies such as AJAX and partial Document Object Model (DOM) updates only make the problem of craw...

متن کامل

Indexing Rich Internet Applications Using Components-Based Crawling

Automatic crawling of Rich Internet Applications (RIAs) is a challenge because client-side code modifies the client dynamically, fetching server-side data asynchronously. Most existing solutions model RIAs as state machines with DOMs as states and JavaScript events execution as transitions. This approach fails when used with “real-life”, complex RIAs, because the size of the produced model is m...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

A Strategy for Efficient Crawling of Rich Internet Applications

نویسندگان

چکیده

منابع مشابه

A Statistical Approach for Efficient Crawling of Rich Internet Applications

A Statistical Approach for Efficient Crawling of Rich Internet Applications1

Building Rich Internet Applications Models: Example of a Better Strategy

GDist-RIA Crawler: A Greedy Distributed Crawler for Rich Internet Applications

Indexing Rich Internet Applications Using Components-Based Crawling

عنوان ژورنال:

اشتراک گذاری